--- interact_link: content/main.ipynb kernel_name: python3 kernel_path: content has_widgets: false title: |- Data analysis pagenum: 0 prev_page: url: next_page: url: suffix: .ipynb search: data countries startdate rescale cases nsmooth country start set used create list name covid available want note confirmed different com module visualize coviddata following function plotly january usually good true curves same case wont repository github python functions hederacovid py object plotting get using example also reported bar once figure shows extremely dependent testing protocols capacity parameters where plot smoothing averaged days total scale mortality rate analysis short report analyzes johns hopkins ccse cssegisanddata related pandemic preliminaries developed simple alfonsocaiazzo toolkit blob master well periodically update datasets those online type datahandler class perform internally operation dataset afterwards ready clean initialize comment: "***PROGRAMMATICALLY GENERATED, DO NOT EDIT. SEE ORIGINAL FILES IN /content***" ---
Data analysis

COVID-19 Analysis

This short report analyzes some of the data from the repository by Johns Hopkins CCSE related to the COVID-19 pandemic.

Preliminaries

We have developed a simple Python module with a set of functions that can be used to visualize the available data. The module is available in the repository (hedera_covid.py), the data as well. We periodically update the datasets with those available online.

import numpy as np
from hedera_covid import DataHandler, plot_death_rate, plot_daily_cases, plot_confirmed_cases

# load data
path_confirmed = '../../Data/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
path_death = '../../Data/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'

First of all, we create an object of type DataHandler. This class can be used to perform internally operation on the dataset, so that data are afterwards ready and clean for plotting.

covid_data = DataHandler(data_confirmed_path = path_confirmed,
                         data_death_path = path_death)

covid_data.confirmed.head()
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 ... 3/31/20 4/1/20 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20
0 NaN Afghanistan 33.0000 65.0000 0 0 0 0 0 0 ... 174 237 273 281 299 349 367 423 444 484
1 NaN Albania 41.1533 20.1683 0 0 0 0 0 0 ... 243 259 277 304 333 361 377 383 400 409
2 NaN Algeria 28.0339 1.6596 0 0 0 0 0 0 ... 716 847 986 1171 1251 1320 1423 1468 1572 1666
3 NaN Andorra 42.5063 1.5218 0 0 0 0 0 0 ... 376 390 428 439 466 501 525 545 564 583
4 NaN Angola -11.2027 17.8739 0 0 0 0 0 0 ... 7 8 8 8 10 14 16 17 19 19

5 rows × 83 columns

Initialize list of countries

Now, it is possible to get data of a country to the list that we want to look at. This can be done using

my_country = covid_data.get_country('Name of the country')

The Name of the country is the name given in the third column (Country/Region) of the covid_data dataframe.

For example:

italy = covid_data.get_country('Italy')

This create a dictionary with the following keys:

italy.keys()
dict_keys(['name', 'dates', 'confirmed', 'deaths', 'daily_new_cases', 'daily_deaths', 'start', 'start_death'])

We can also create a list of countries

covid_data.add_country('Name of the country')

that we can analyze together.

my_countries = ['Italy','Spain','Germany','Austria','France','United Kingdom','US','Sweden','Netherlands']

for c in my_countries:
    covid_data.add_country(c)

Starting from this object, we can now use some functions implemented in our module to visualize the data

Note: (this is an ongoing work!)

First, we have to collect the data. For the reported cases we use this function:

data = covid_data.get_confirmed_data(start_date=0,n_smooth=7,rescale=True)

Then we can create a plotly bar chart (for example) and display it.

# for plotly
from plotly.offline import iplot
from plotly.offline import init_notebook_mode, plot
from IPython.core.display import display, HTML
import plotly as py
import plotly.tools as tls
init_notebook_mode(connected=True)

fig = {
    "data": data,
    "layout": {"title": {"text": "Confirmed Cases (reascaled in each country)"}}
}

plot(fig, filename = 'figure.html')
display(HTML('figure.html'))

Remark

We can also load at once all available countries.

For this, we will first create a new handler (so that we do not mix up data):

# create first a new handler
covid_data_full = DataHandler(data_confirmed_path = path_confirmed,
                              data_death_path = path_death)

    
    

Now, we can get all names. Note that we have to use np.unique since there are countries that appear multiple times (this is how JHU stored the data)

# now get all names
all_country_names = covid_data.confirmed['Country/Region']
all_country_names = np.unique(all_country_names)
for n in all_country_names:
    covid_data_full.add_country(n)
# alternative
import plotly.graph_objects as go
init_notebook_mode(connected=True)

fig = go.Figure()

for c in covid_data_full.countries:
    fig.add_trace(go.Bar(x=c['dates'],
                         y=c['confirmed'],
                         name=c['name'],
                        )
                 )

fig.update_layout(
    title='# Confirmed Cases',
    xaxis_tickfont_size=14,
    yaxis=dict(
        title='Cases',
        titlefont_size=16,
        tickfont_size=14,
    ),
    legend=dict(
        x=0,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ),
    barmode='group',
    bargap=0.15, # gap between bars of adjacent location coordinates.
    bargroupgap=0.1 # gap between bars of the same location coordinate.
)

plot(fig, filename = 'figure.html')
display(HTML('figure.html'))

We can use plotly to visualize all countries at once. Double-click on the legend name to isolate one single country.

Number of confirmed cases

Daily variation

The following figure shows the number of confirmed cases. Note that this is extremely dependent on the different testing protocols and capacity of different countries.

For this, we can use the function

plot_daily_cases(countries,start_date,n_smooth,rescale)

Parameters:

  • countries: a list of countries
  • start_date: day where the plot start (0 = January 22)
  • n_smooth: smoothing of the data (data will be averaged over n_smooth days, 7 is usually good)
  • rescale: set True if you want to rescale curves to the same start. In this case start_date won't be used.
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 14})
plot_daily_cases(covid_data.countries,start_date=0,n_smooth=7,rescale=True)

Total cases (reported)

We can look at the total number of cases using the function

plot_confirmed_cases(countries,start_date,n_smooth,rescale)

Parameters:

  • countries: a list of countries
  • start_date: day where the plot start (0 = January 22)
  • n_smooth: smoothing of the data (data will be averaged over n_smooth days, 7 is usually good)
  • rescale: set True if you want to rescale curves to the same start. In this case start_date won't be used.
  • log_scale: set to True to scale use logarithmic scale for the y-axis
plot_confirmed_cases(covid_data.countries,start_date=30,n_smooth=0,rescale=False,log_scale=True)

Mortality Rate

We can look at the official mortality rate over time for a selected set of countries

import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 14})
plot_death_rate(covid_data.countries,start_date = 32)

Number of confirmed cases

The following figure shows the number of confirmed cases. Note that this is extremely dependent on the different testing protocols and capacity of different countries.

Use:

  • start_date to start plotting from a particular day (later than January 22)
  • n_smooth to smooth the data (7 is usually good)
  • rescale if you want to rescale curves to the same start. In this case start_date won't be used.
import matplotlib.pyplot as plt
plt.rcParams.update({'font.size': 14})
plot_daily_cases(covid_data.countries,start_date=0,n_smooth=7,rescale=True)